Comparing the Accuracy of Large Language Models (LLM) in Trending Obstetrical Topics
Author Details
Journal Details
Published
Published: 9 May 2025 | Article Type : Research ArticleAbstract
Generative artificial intelligence (AI) is rapidly expanding in medicine, where both patients and healthcare providers are increasingly relying on large language model (LLM) chatbots for information. In this study, we evaluated four AI chatbots—ChatGPT 4.0, Gemini 3.7, Copilot AI, and Perplexity AI —by analyzing their responses to queries related to three obstetrical pathologies: preeclampsia, placental abruption, and gestational diabetes mellitus. Queries for the top five obstetrical pathologies were obtained from U.S. Google Trends data spanning December 10, 2019, to December 10, 2024. AI-generated responses were assessed using validated evaluation tools: the Patient Education Material Assessment Tool (PEMAT) for understandability and actionability, DISCERN for information quality, and the Flesch-Kincaid formula for readability. AI-generated content was reviewed for alignment with guidelines from the American College of Obstetricians and Gynecologists (ACOG). PEMAT scores for understandability and actionability were analyzed using chi-square tests, while DISCERN and Flesch-Kincaid scores were evaluated using the Kruskal-Wallis test. ChatGPT showed promising results through PEMAT actionability, PEMAT understandability, and DISCERN scores. The Flesch-Kincaid readability scores of all the chatbots were similar, as they all were written at a high school grade level. This indicates a need for AI chatbots to formulate responses that cater to varying grade levels of knowledge. Furthermore, there is a future where AI becomes the primary source of information, and it is important to continually challenge and evaluate LLMs for potential misinformation and accurate data.
Keywords: Preeclampsia, Placental Abruption, Gestational Diabetes Mellitus, Artificial Intelligence, Obstetrics.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © Author(s) retain the copyright of this article.
Statistics
14 Views
22 Downloads
Volume & Issue
Article Type
Research Article
How to Cite
Citation:
Amber Khemlani, Joshua Singavarapu, Ranjitha Vasa, Huber Rodriguez-Tejada, Harsh Reshamwala, Ozgul Muneyyirci-Delale, Mudar Dalloul. (2025-05-09). "Comparing the Accuracy of Large Language Models (LLM) in Trending Obstetrical Topics." *Volume 7*, 1, 18-22